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ABSTRACT 

An  estimator  in  the  extended  class  of  Stein  estimators  has  two  undesirable 
properties.  For  a  small  value  of  prior  guess,  it  ignores  the  data.  Moreover,  for  some 
cases  its  risk  is  not  uniformly  smaller  than  that  of  Stein  estimator.  We  show  that 
there  exists  a  lower  bound  on  r(5)  to  guarantee  a  smaller  risk,  and  the  resulting 
estimator  does  not  ignore  the  data. 


1.  Introduction  and  Motivation. 


Consider  a  problem  of  estimating  the  mean  vector  9  of  a  p  >  3  dimensional 
multivariate  normal  distribution  on  the  basis  of  sample  X  ~  Np(9, 1).  Under  the 
squared  error  loss,  the  maximum  likelihood  estimator  $0(-X)  =  X  has  risk  R(9,S)  =  p 
for  every  vector  point  of  6.  James  and  Stein  (1960)  showed  that  the  estimator 

(1.1)  sMX)  =  {1-2^)X,  S  =  \\X\\2  =  J2X? 


has  risk  R{9,  Sjs)  <  P  for  every  9.  Even  though  it  is  uniformly  better  than  the  MLE  we 
cannot  use  this  estimator  for  the  case  of  S  <  p- 2.  Our  main  purpose  in  this  article  is 
to  show  that  there  exists  a  class  of  estimators,  containing  a  prior  knowledge,  whose 
members  have  a  smaller  risk  than  that  of  the  Stein  estimator.  And  also  they  give 
better  protection  against  misspecification  of  the  prior  knowledge. 

Sclove  (1968)  made  an  improvement  using  only  the  positive  part  of  the  Stein 
estimator.  It  is 

SfsW  =  (1 '  ">“{1.  5  >  °> 

which  satisfies  the  Baranchick  (1970)  conditions  for  minimax  estimators.  Efron  and 
Morris  (1973)  interpreted  the  original  estimation  problem  of  the  normal  location 
parameter  vector  as  an  estimation  problem  for  the  hyper-parameter  of  a  normal 
prior  (distribution),  9  ~  NP(0,B~1(1  -  B)I)  with  B  e  (0, 1),  under  a  “Relative  Savings 
Loss”  (RSL) 


(1.2) 


RSL{B,  S) 


R(B,  5)  —  R[B,  8*) 
R{B,6q)-R{B,6*) 


where  R(B,S)  is  the  expected  risk  of  an  estimator  5  and  8*{x)  =  (1  -  B)X.  They 
minimized  EgRSL(B,8)  over  the  Baranchick  class  of  minimax  estimators  and  then 
they  derived  the  extended  class  of  Stein  estimators, 


(1.3) 


8^(X)  =  (1  -  min{£>,  C-&-^±))X, 


1  <  c  <  2,  0  <  6  <  1, 


where  Eg{-)  indicates  expectation  with  respect  to  a  hyper  prior  (distribution)  g(B) 
for  the  hyper-parameter  B  of  the  normal  prior.  We  note  that  is  a  member  of 
the  extended  class  of  Stein  estimators.  The  estimators  8£+  are  not  comparable  with 
Stein  estimators  unless  b  —  1  =  c.  That  is,  for  6^1  and  c  ^  1,  there  exists  8%  and 
9 2  ( &i  t A  82)  such  that  R(91,8jS)  <  R(91,8£+)  while  R(92,8js )  >  R(92,8j;+).  Only  the 
positive  part  Stein  estimator  is  uniformly  better  (in  terms  of  risk)  than  8js . 

A  natural  question  arises  at  this  point.  Suppose  we  have  a  prior  knowledge 
(with  a  strong  belief)  which  can  be  expressed  by  i\rp(0,&-1(l  -  b)I)  with  known  b, 
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and  we  want  that  the  estimator  we  shall  use  has  smaller  risk  than  that  of  Stein 
estimator.  Do  we  need  to  ignore  the  prior  guess  6  and  use  Sjs  even  though  we 
believe  that  the  linear  estimator  (1  -  b)X  is  correct?  This  motivates  us  to  seek  for 
an  improved  estimator  which  allows  the  use  of  a  prior  knowledge. 

Hence  we  have  two  criteria.  The  first  one  is  that  an  improved  estimator,  con¬ 
taining  a  prior  information,  should  have  smaller  risk  than  that  of  Stein  estimator. 
Stein  (1973)  proposed  a  class  of  estimators  which  may  have  members  dominating  the 
positive  part  estimator.  Efron  and  Morris  (1976)  gave  a  general  class  (a  larger  class 
than  Alam’s  (1973))  of  minimax  estimators  which  allows  r(5)  in  8  =  (l-  ^r(5))A  to 
decrease.  Conditions  for  estimators  with  r(S)  strictly  decreasing  at  some  point  with 
smaller  risk  than  6js  has  not  yet  been  found.  However,  we  restrict  our  attention 
to  the  Baranchik  class  of  minimax  estimators  with  lim  t(S)  =  1,  and  we  show  that 

S — ►CO 

there  exists  a  lower  bound  for  t(S)  which  leads  to  the  better  estimator.  This  is  done 
in  theorem  2  in  section  2. 

As  the  second  criterion,  an  improved  estimator  must  have  good  protection 
against  a  prior  misspecification  because  (almost)  always  we  have  some  useful  in¬ 
formation  about  the  problem  other  than  the  sample.  A  Bayesian  may  hope  for  a 
posterior  robustness  over  the  all  prior  distributions  while,  based  on  sampling  theory, 
the  risk  robustness  (thus  minimaxity)  is  desired.  A  plausible  compromise  between 
these  two  extremes  may  be  a  Bayes  risk  robustness  over  the  all  prior  distributions. 
It  is  very  difficult,  at  least  for  us,  to  work  with  a  class  of  all  prior  distributions;  thus 
we  restrict  the  class  of  prior  distributions  to  the  class  of  normal  distributions  with 
zero  mean  vector  and  B~l(l-B)I  covariance,  indexed  by  the  hyper-parameter  B.  We, 
therefore,  adapt  the  relative  savings  loss  (defined  in  (1.2)  with  Np(0,  B_1(l  -  B)I)) 
which  is  a  normalized  version  of  a  Bayes  risk  and  is  a  function  of  B  alone  as  a 
measure  of  protection  against  a  wrong  prior  guess.  Thus  the  estimator  must  have 
smaller  RSL  than  8JS  over  the  region  of  B  €  (0, 1). 

In  summary,  an  improved  estimator  in  the  form  of  SbH  (A)  =  must 

satisfy  the  following  conditions. 

Condition  1.  rjy(6,5)  is  nondecreasing  in  S  >  0. 

Condition  2.  0  <  TH(b,  5)  <  min{l,  (p  -  2)6/5},  5  >  0  1  >  6  >  0. 

Condition  3.  R{8,8^)  <  R(8,8jS)  for  all  8. 

We  note  here  that  if  Condition  3  is  satisfied  then  RSL[B,  <  RSL{B,6JS)  for  all 
B/b  >  0  where  6  is  a  prior  guess  for  B.  To  choose  the  best  one  among  estimators  in 

D  =  {estimators  which  satisfy  above  3  conditions}, 
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we  minimize  RSL(b,S)  over  the  class  D.  We  find  that  estimators  with,  for  2n  =  p-  2, 

(  bSI(o,2n/b)(S)  +  2n/(2n/6,oo)(‘S)>  n/(n  +  1)  <  b  <  1, 

(1.4)  2nrH(b,S)=l 

l  "ln{S)I(0,Sb){S)  +bSI(sb,2n/b){S)  +  %nI(2n/b,oo){S)>  0  <  b  <  n/(n  +  1) 

satisfy  the  above  three  conditions  and  minimize  RSL{b,  S)  over  the  estimators  in  D. 
This  is  done  in  Section  3.  We  note  here  that  Sb  is  the  solution  of  7„(a)  =  bs  for 
b  e  (0,n/(n  +  1)]  and  that  7„(a)  is  defined  by 

(1.5)  ln{s)  =  s  f  i"exp(— ts/2)dt/  f  t”-1  exp(— ts/2)dt. 

Jo  Jo 

Its  properties  are  given  in  the  appendix. 

2.  Main  Result. 

Estimators  that  contain  a  prior  knowledge  and  have  smaller  risk  than  that  of 
Stein  estimator  are  desired.  For  this  purpose  we  start  with  an  estimator  with 
absolutely  continuous  r(a)  in  order  to  get  a  lower  bound  on  r. 

Theorem  1.  (Efron  and  Morris  (1976)).  Suppose  r  is  absolutely  continuous  with 
derivative  r'.  If  the  risk  R{8,  S)  is  finite  and  if  the  expectation  of  each  term  in  (2.1) 
exists,  then  a  unique  unbiased  estimator  of  12(0,5)  based  on  the  sample,  S,  exists 
and  is  given  by 

(2.1)  R(0, 5)  =  p  —  (p  —  2)[^r(5)(2  -  r(5))  +  4/(5)]. 

This  theorem  implies  that  nondecreasing  condition  of  r  is  not  necessary  for 
an  estimator  to  be  minimax,  but  no  convenient  substitution  for  this  condition  has 
been  found.  We,  therefore,  keep  the  nondecreasing  condition  in  Baranchick’s  (1970) 
theorem.  The  Baranchick  class  of  minimax  estimators  is  too  large  since  it  contains 
some  estimators  that  are  not  better  than  the  Stein  estimator.  One  way  to  guarantee 
that  the  estimator  we  will  use  is  better  than  the  Stein  estimator  is  to  make  it  satisfy 
the  conditions  in  the  following  theorem. 

Theorem  2.  If  the  absolutely  continuous  function  r(a)  with  derivative  /(a)  satisfies 
the  conditions  for  any  5  >  0  and  p  =  2(n  +  1)  >  3, 

i)  r(-)  is  non-decreasing, 

ii)  7n(<S')/2n  <  r(5)  <  min{2n/5, 1}, 

then  R(6,6)  <  R{0,SjS )  for  all  6.  The  equality  holds  when  |0|2  =  0  and  r(5)  =  7„(5)/2n. 
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Proof.  From  Theorem  1, 


m  S)=P-  2nEe{—r{S){2  -  r(5))  +  4r'(S)} 

(2.2)  =  {p-2nEef}  +  2nEe{^{l  -  r(S))2  -  4r'(S)} 

=  R{6,  Sjs)  +  2nEe{^(l  -  r(S))2  -  4r'(S)}. 

Thus  it  is  enough  to  show  that 

(2.3)  £*[^r(l  -  4$))2  -  4r'(S)]  <  0  for  all  9. 

Since  5  is  distributed  as  chi-square  with  p  degrees  of  freedom  and  noncentrality 
parameter  A  =  ||0||2/2,  (2.3)  can  be  expressed  as 

M  E  lf(1  -  r(s))2  -  4/(s>l  =  E 

k~0  ■  k=0 

where  Ep+2k{  )  indicates  the  expectation  with  respect  to  the  central  chi-square  dis¬ 
tribution  with  p  +  2k  degrees  of  freedom.  For  the  chi-square  distribution,  it  can  be 
shown  by  integration  by  parts  that 


(2.5) 


Em{S  -  mn)h(S)  =  2nEmSh'(S) 


for  S  ~  pt-Xm  and  h(-)  such  that  all  expectations  in  (2.5)  exist.  Using  this  equality, 
Rk[r)  in  (2.4)  can  be  rewritten  as 


Rk{r)  =  7TT^+k){l  ~  r(S)}2  -  -^-rE2{n+k){\S  -  2{n  +  fc)]r(S)} 


n  +  k 


n+k 


since 

4Em+2{r'{S))  =  4Em{-r'{S)) 
m 

=  ~Em{[S  -m\r{S)} 
m 

with  m  —  2(n  +  k).  Further,  by  integration  by  parts, 

1 


i«fc(r)  = 


(2.6) 


2  (n  +  k) 


{2 n  +  4kE2(n+k)r(S)  +  2 nE2(n+k)(r(S))2  -  4 (n  +  k)Ep+2kr(S)} 


«  n(c  —  l)2  +  J (2(n  +  k)Fp+2k{s)  -  {2k  +  2nT{s))F2(n+k){s)}dT, 


where  c  =  lim  r(s)  =  1  from  condition  i)  and  li),  thus  the  first  term  vanishes.  The 

8  "  *  OO 

integrand  in  the  second  term  can  be  rewritten  as 


(27) 


F2(„+fc)  (s)  •  [2(n  +  k)  -2k-  2nr{s)\, 
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where  in+k(s) /2{n  +  k)  =  F2{n+k+1){s)/F2{n+k){s)  with  -7 „(•)  defined  in  (1.5).  (See  the 
appendix  for  the  several  properties  of  '/„(•)  function.)  We  also  know  that  2 -7„+i(a)  + 
7„(s)  >  0  for  every  n  >  0  and  a  >  0.  Thus  the  expression  in  braces  in  (2.6)  can  be 
rewritten  as 

7n+fc(s)  -2k-'in{s)  +  7„(s)  -  2nr(a) 
k 

=  7n(«)  -  2nr(a)  -  ]T(2  +  '/n+3_1(a)  -  7„+J(a)) 
i=i 

<7n(«)  —  2nr(a) 

<  0  from  condition  (ii). 

Combining  this  and  condition  (i)  of  nondecreasing  r(a),  the  expression  for  Rk{r)  in 
(2,6)  gives  Rk(r)  <  0  for  every  integer  k  >  0;  thus  the  theorem  is  proved.  We  note 
that,  when  ||0j|2  =  0  (i.e.  A  =  0),  if  we  choose  r(s)  =  7„(a)/2n  then  the  quantity  in 
brackets  in  (2.7)  becomes 

7n(a)  -  2nr(a)  =  7„(a)  -  7„(a)  =  0 

and  the  Poisson  random  variable  has  nonzero  weight  only  when  k  =  0.  This  proves 
the  assertion  of  the  equality.  Q.E.D. 

3.  Construction. 

If  r(a)  is  not  absolutely  continuous  then  the  expression  (2.1)  does  not  exist. 
However,  the  lower  bound  7„(a)/2n  for  r(a)  is  very  useful  searching  for  a  better 
estimator.  Define 

f  ELi  if  s<  exists  for  »  =  !.  2, 

(2.8)  r(a)  =  l 

(n(a),  otherwise, 

where  n(S'<)  =  ri+1(5j)  for  i  =  1, 2,  S0  =  0,S3  =  00  and  rt(a)  are  absolutely  continuous 
in  (5,-_i,  $].  Then,  with  some  conditions  on  r,(a),  such  r(a)  gives  a  smaller  risk  than 
that  of  the  Stein  estimator. 

Theorem  3.  An  estimator  S(X )  =  (l-2nr(S)/5)X  with  r(a)  defined  in  (2.8)  has  smaller 
risk  than  that  of  the  Stein  estimator  if,  for  any  value  of  s,  -yn(s)/2n  <  r(s)  <  min{l,  s/2n}, 
where  p  =  2(n+  1)  and  r(s)  is  nondecreasing. 

Proof.  If  Si  does  not  exist  for  i=  1,2  then,  since  7„(a)/2n  <  ^(a)  =  r(s)  <  niin{l,  S/2n} 
for  any  S  >  0  and  if  7-1(3)  is  absolutely  continuous,  this  theorem  is  proved  from 
Theorem  1.  The  risk  difference  can  be  expressed  as 

R{6,  S)  -  R{6,  Sjs )  =  £  X1,eXl A)fly(r) 
i= 0 
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where 


(2  g)  RAr)  =  Ep+2i[S(  1  -  f- r(a))2  -  4/(1  -  ”r(,))  Hr  2 j]  -  p  +  4n2/2(n  +  j) 

oc  n(l  -  2Ep+2jT{s )  +  E2(n+]) [r(a)]2)  -  2j{Ep+23r{s)  -  £2(n+y)T(a)). 

We  know  that  if  F0,  Fi  are  two  cumulative  distribution  functions  on  the  real  line  such 
that  Fi(x)  <  F0{x)  for  all  x,  then  E0h(X)  <  Eih(X)  for  any  nondecreasing  function  h(-). 
Thus,  Ep+2jT(a)  -  Ep+23-2t(s)  >  0  since  our  r(a)  is  nondecreasing  and  Fp+2j  <  Fp+2J-2. 
It  is,  therefore,  enough  to  show  that  the  first  term  in  (2.9)  is  nonpositive.  For  k  =  3, 


(2.10) 


Rji  =  1  —  2  Ep+23-t(s)  +  Ep+2}-2(t(s)Y 


=  i-E{2  T  ^)dFP+23(s)~  fS'  (n(a))2dFp+2y_2(a)} 

*  Sj-i  »  ^i— 1 


We  note  here  that,  for  all  S  e  {S2,  oo),  r3(a)  =  1.  In  each  interval  (St_i,£t],  (2.10)  can 
be  rewritten  by  integration  by  parts,  as 


R, i  =  1  -2V{f’2(S,)ri(5,)  -  -  f  '  F2(3)dn(s)} 

im  1  J*-l 

+  E{Fi(5i)[r.-(5'i)]2-F1(5i_1)(r,(5<_1)]2-2  *(a)*i  («)dn(«)> 

»=1  J  Si-i 

«  E  fS<  (Ms)-F1(3)ri(3))dri(3) 


where  Fi(a)  =  Fp+23-2(s)  and  F2(a)  =  Fp+2j(3).  From  the  property  (viii)  of  the  7„() 
function  (see  the  appendix),  7n+*(S)/2(n  +  ifc)  <  in(S)/2n  <  r,(a)  for  all  S  e  ($_!,£<], t  = 
1,2,3.  This  proves  the  result  for  k  =  3.  For  k  =  2,  it  can  be  shown  easily  by  putting 
S2  =  oo  and  r2(a)  =  l  for  all  a  >  Si.  Q.E.D. 

Any  estimator  defined  in  Theorem  3  has  uniformly  smaller  risk  than  that  of  the 
Stein  estimator.  Thus,  from  the  definition,  so  does  the  RSL.  To  choose  the  best 
among  them,  we  minimize  RSL(b,S)  under  the  restrictions  that  7„(S)/2n  <  r(S)  < 
min{l,5/2n}  and  nondecreasing  r(a).  Let  Sb  be  a  solution  of  7„(a)/a  =  b. 


Theorem  4.  The  estimator  in  the  class  of  estimators  defined  in  Theorem  3  which 
minimizes  RSL(b,6)  is  given  by  S^(X)  =  (1  -  %rTff(b, S)) X,  where 

(  7n(R)/2nI(0,St)(S)  +  f>S/2nI(s„,2n/b)(R)  +  J(2n/b,cc)(S)>  if  Sb  exists, 
rH(b,S)=  \ 

{  bS/2nI(o,2n/b) (S)  +  I(2n/bi oo)  ( S ),  otherwise. 
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Proof.  We  note  here  that  the  condition  that  Sb  exists  can  be  replaced  by  that  of 
0  <  b  <  (p-2)/p  since  for  any  a  >  0,7„(a)/3  <  [p-2)/p.  It  can  be  shown  that 


iZ5X(6,5)  =  ^p+2{(-r(5)-l)2|6} 

=  W{§M*)  -  £)m  a  *UHS)  -  £)2|6} 

is  minimized  at  r(S)  =  bS/2n.  Imposing  the  restriction  that  7„(S)  <  2 nr (5)  <  min{S,  l/2n}, 
that  we  get  RSL(b,S £•)  =  min  RSL(b,6),  where  the  minimization  is  over  the  etimators 
defined  in  this  theorem. 


4.  Evaluation  and  Comments. 

1.  The  lower  bound  7 n[S)/S  of  shrinkage  has  an  interesting  property  when  S 
approaches  to  zero.  In  the  Bayesian  framework  with  normal  prior  JVP(0,B~1(1-B)I), 
the  marginal  distribution  of  X  is  Np(0, 5-1  J).  When  we  have  X  —  0  (thus  S  =  0) ,  then 
the  (empirical)  Bayesian  estimator  of  B  will  be 


E{X=o)B  =  SoBBn^exp(-BS/2)g{B)dB 
/0  B»+l  exp{-BS/2A)g[B)dB 


0 


=  £  Bn+2g[B)dB/  £  Bn+1g{B)dB  <  1, 


where  the  equality  holds  if  and  only  if  the  hyper  prior  g[B)  is  concentrated  at  B  =  1. 
It  depends  only  on  the  prior  information.  When  we  do  not  have  any  information 
about  B  and  we  use  g[B)  <x  B~ 2  (a  limiting  case  of  Strawderman  (1971)),  then 
E^X=0^B  =  =  Jim  We  note  again  p  -  2  =  2n. 

2.  Another  interesting  property  is  that  it  makes  it  easy  to  put  some  prior 
information  about  B,  say  b  e  (0, 1],  into  the  estimation  procedure.  One  example 
using  the  lower  bound  is  an  estimator  with  rH{b,S)  in  (1.5).  It  has  a  good  property 
which  an  estimator  in  the  extended  class  of  Stein  estimators  defined  in  (1.4)  with 
c  =  1  does  not  possess.  Berger  (1982)  gave  it  an  intuitive  justification  as  being  the 
Bayes  estimator  (based  on  9  ~  Np{ 0,6-1(l  -  b)f))  when  the  prior  guess  b  is  supported 
by  the  data  (small  a),  and  being  a  Stein  estimator  otherwise.  The  null  hypothesis  of 
B  =  b  is  rejected  if  the  data  turn  out  to  be  small  (near  zero),  and  we  can  infer  that  B 
is  bigger  than  b,  but  ££[S)  remains  in  b.  This  undesirable  property  of  ££  ( S )  becomes 
severe  when  b  approaches  zero.  That  is  lim  B?  (5)  =  0  no  matter  what  the  data  are. 
When  we  have  almost  zero  prior  knowledge  (almost  uniform  distribution  on  & )  then 
the  extended  class  of  Stein  estimators  becomes  MLE;  thus  the  risk  remains  p  for  any 
value  of  9.  But  Bff(b,S)  =  [p  -  2)^(6, S)/S  with  rH{b,S)  in  (1.4)  gives  7 n{S)/S  when  b 
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approaches  zero  and  we  know  that  its  risk  R(9,Sh)  is  smaller  (uniformly)  than  that 
of  the  Stein  estimator;  thus  the  effect  of  the  lower  bound  7 „(S)/S  is  great. 

3.  Another  merit  of  Bn[b,S)  is  that  it  gives  a  very  stable  protection  against 
misspecification  of  prior  information.  This  can  be  explained  in  terms  of  relative 
savings  loss  ,  Berger  (1982)  expressed  the  RSL  of  estimators  in  the  extended  class 
of  Stein  estimators  as  a  function  of  B/b. 

Theorem  5.  (Berger  (1982)).  Define  2n  =  p  -  2  and  A  =  B/b,  where  B  and  6  axe  true 
and  prior  value  of  hyper  parameter  of  normal  distribution  iV(0,5-1(l-  B)I).  Then 

RSL{B,  6+)  =  (1  -  A"1)2  +  Ai(A)  •  [1  -  -FP+2(2nA)] 

+  ^2 (A)  •  0  ■  /p+2(A|2ra), 


where 

•^i(A)  =  2/p  —  (1  —  A-1)2, 

A2(A)  =  1/p  —  l/(p  —  2)A, 

/p+2(A|2n)  =  2nXp+2, 

fp+2(0  is  the  cdf  of  the  chi-square  distribution  with  p  +  2  degrees  of  freedom. 

The  values  of  RSL  when  p  =  4  for  various  A  are  in  Table  1.  It  shows  that  if  B  >  36 
then  RSL(B,Sf)  >  0.5  =  RSL(B,SJS)  while,  for  any  value  of  B/b,RSL(B,S^)  <  0.5. 


TABLE  1.  RSL{B,6?),p  =  4 


A 

RSL 

A 

RSL 

0 

0.5 

1.5 

0.42 

.1 

0.469 

3 

0.505 

.3 

0.423 

4 

0.584 

.5 

0.393 

5 

0.648 

.7 

0.376 

10 

0.810 

0.9 

0.370 

100 

0.980 

1.0 

0.368 

CO 

1.0 

4  An  analogue  of  the  generalized  prior  distribution  on  9,  which  Berger  (1980) 
suggested  using  is 
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It  is  a  heavy  tail  prior,  the  tail  chosen  to  yield  robustness  (on  the  prior) .  This  leads 
to  the  Bayes  estimator  (1  -  The  constant  c  can  be  interpreted  as 


c  =  B{  1  +  prior  guess  for  common  variance  in  jVp(0,r/)}. 

But  it  can  be  less  than  B;  thus  the  variance  term  c/B  -  1  can  be  negative.  To  avoid 
this  difficulty,  the  range  of  integration  is  modified  by  B  e  (0,c)  not  (0,1).  Then 

K{9)  =  J\c/B  -  l)-p/2  exp{-^g}ir2<fi? 

a  JjB-1  -  l)-»/2exp {-iLM-}B-*dB 

and  this  leads  to  the  Bayes  estimator  (l--7n(3)/a)Jf.  We  note  here  that  the  shrinkage 
estimator  due  to  the  robust  prior  distribution  is  the  lower  bound  of  r(3). 

5.  6ba  has  an  empirical  Bayes  property.  The  fact  that  lim  2 n/S  =  B  with  proba- 
bility  one  is  known.  Thus  with  the  expression  7„(S)/2n  =  l-{£,”o(3/2n)’  r/n-H+i))-1? 
it  can  be  shown  that  lim  7„(5)/2n  =  1  with  probability  one  and  these  imply  that 

n — ►  oo 

both  bounds  -jn(S)/S  and  2 n/S  approach  to  the  true  value  of  B  with  probability  one 
for  the  case  of  large  p  (thus  large  n).  So  does  BH{bsS).  This  implies  that,  for  large 
p,6bH{X)  is  very  close  to  the  optimal  linear  estimator  (1  -  B)X. 
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APPENDIX 


1.  Properties  of  7„(a). 

For  any  n  >  0  and  a  >  0, 

i)  0  <  7„(a)  <  2 n. 

ii)  rin{a)  is  increasing  in  a. 

iii)  7„(a)  is  increasing  in  a. 

iii)  lim  7„(a)/a  =  n/(n  +  1). 

iv)  7„(a)/a  is  decreasing  in  a. 

v)  lim  'ini3)  =  a- 

n— *-oo 

vi)  0<'7n+i(a)-7„(a)  <2. 

vii)  7„(a)/2n  =  ifyn+ijM/i^a)  where  .F2ra(3)  is  the  cdf  of  the  chi-square  distribution 
with  2 n  degrees  of  freedom. 

viii)  7„+i(s)/7n(s)  is  decreasing  in  n. 

ix)  7„+i(a)  -  7n(s)  is  increasing  in  a. 

Proof.  From  (i)  to  (vi),  the  proof  is  in  Berger  (1980).  For  the  part  (vii), 

7n(a)  _  a  f*taexp(-ta/2)dt 
2n  2 n  J^1  jn-i  exp(— ts/2)dt 

_  r(n)2n  /q  xn  exp[—x/2)dx  _  Ifyn+i)  00 
T(n  +  l)2n  f‘  exp(— x/2)dx  F^a) 

For  part  (viii),  suppose  there  exist  some  a,  say  S0,  such  that 

7«+2(Sb)  -  7n+l(^o)  >  7r»+l(‘S'o)  ~  7n(^'o) 

and  this  is  true  for  any  n  >  0.  We  know  that  7n+i(£b)  -  7„(So)  >  0  for  any  n  >  0  from 
part  (vi).  But  part  (v)  gives  lim  (7„+2(5'o)  -  7n+i(S'o))  =  0.  Therefore,  there  exists  no 

fl— ►CO 

such  £0,  and  thus  the  assertion  follows.  With  this  and  the  fact 
d  1 

£j(7«+l(3)  -  7»(«))  =  —  {7n+l(3)Ai  +  l(s)  -  7n(s)^Ms)} 

a  [{7n+l(s)  -  7n(s)}{2  +  7n(s)}  -  7n+l(3){7r.+2(3)  ~  7»+l(s)}] 

>  (7n+l(s)  ~7n(s))  -  (7n+2(s)  ~  7r»+l(«)) 

>  0  from  part  (viii), 

part  (ix)  is  clear.  We  note  here  that  A„(a)  =  2  -  7«+i(a)  +  7„(a). 
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